Towards Optimal and Expressive Kernelization 
for d-Hitting Set 



Rene van Bevern* 

Institut fur Softwaretechnik und Theoretische Informatik, TU Berlin, Germany, 
rene . vanbevernOtu-berlin . de 



Abstract. A sunflower in a hypergraph is a set of hyperedges pairwise in- 
tersecting in exactly the same vertex set. Sunflowers are a useful tool in 
polynomial-time data reduction for problems formalizable as <f -HITTING 
Set, the problem of covering all hyperedges (of cardinality at most d) of a 
hypergraph by at most k vertices. Additionally, in fault diagnosis, sunflowers 
yield concise explanations for "highly defective structures". We provide a 
linear-time algorithm that, by finding sunflowers, transforms an instance of 
d-HlTTING Set into an equivalent instance comprising at most 0(k d ) hy- 
peredges and vertices. In terms of parameterized complexity theory, we show 
a problem kernel with asymptotically optimal size (unless coNP 5 NP/poly). 
We show that the number of vertices can be reduced to 0(k^~ l ) with addi- 
tional processing in 0(& 1M ) time — nontrivially combining the sunflower 
technique with known problem kernels due to Abu-Khzam and Moser. 

1 Introduction 

Many practically relevant problems like the examples given below boil down to 
solving the following NP-hard problem: 

d-HiTTiNG Set 

Input: A hypergraph H = (V,E) with hyperedges of cardinality at most d and a 

natural number k. 
Question: Is there a hitting set SqV with \S\<k and Ve eE: e nS # 0? 

Examples for NP-hard problems encodeable into cZ-Hitting Set arise in the 
following fields. 

1. Fault diagnosis: The task is to detect faulty components of a malfunctioning 
system. To this end, those sets of components are mapped to hyperedges of 
a hypergraph that are assumed to contain at least one broken component [1, 
14, 21]. By the principle of Occam's Razor, a small hitting set is then a likely 
explanation of the malfunction. 

2. Data clustering: all optimization problems in the complexity classes MIN F + n^ 
and MAX NP, including (s-Plex) Cluster Vertex Deletion [5, 13] and 
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(a) A boolean circuit with circle nodes (b) Sets containing at least one faulty gate, 

representing gates and square nodes rep- found by the analysis of the circuit, 

resenting input and output nodes. 



Fig. 1. Illustrations for Example 1. 

all problems of establishing by means of vertex deletion a graph property 
characterized by forbidden induced subgraphs with at most d vertices, can be 
formalized as d-HiTTiNG Set [15]. 

All problems have in common that a large number of "conflicts" (the possibly 
0(| V\ d ) hyperedges in a d-HlTTlNG Set instance) is caused by a small number of 
components (the hitting set S), whose removal or repair could fix a broken system 
or establish a useful property. However, often, not only a solution to a problem is 
sought, but also a concise explanation of why a solution should be of the given form. 
In this work, we contribute to this by combining concise explanations with data 
reduction, wherein our data reduction preserves the possibility of finding optimal 
solutions and gives a performance guarantee. This kind of data reduction or, more 
specifically, problem kernels, are a powerful tool in attacking NP-hard problems 
like d-HlTTlNG Set [6, 11]. Our main ingredient and contribution are efficient 
algorithms to find sunflowers, a structure first observed by Erdos and Rado [9]: 

Definition 1. In a hypergraph (V,E), a sunflower is a set of petals P QE such 
that every pair of sets in P intersects in exactly the same set C QV, called the core 
(possibly, C = 0). The size of the sunflower is \P\. 

A sunflower with k + 1 petals in a c?-Hitting Set instance yields a concise ex- 
planation of why one of the elements in its core should be removed or repaired: 
For every sunflower P with at least k + 1 petals, every hitting set of size at most k 
contains one of P's core-elements, since it cannot contain an element of each of the 
k + 1 petals. Analyzing the petals of this sunflower could guide the 'causal analysis' 
of a problem. We illustrate this using an example. 

Example 1. Figure 1(a) shows a boolean circuit. It gets as input a 4-bit string x = 
and outputs a 5-bit string fix) = f\{x) . . . /s(x). The nodes drawn as circles 
represent boolean gates, which output some bit depending on their two input bits. 
They might, for example, represent the logical operations "a" or "v". 

Assume that all output bits of f(x) are observed to be the opposite of what 
would have been expected by the designer of the circuit. We want to identify broken 



2 



gates. For each wrong output bit fi(x), we obtain a set S; of gates for which we 
know that at least one is broken, because fi(x) is wrong. That is, S; contains 
precisely those gates that have a directed path to fi(x) in the graph shown in 
Figure 1(a). We obtain the sets illustrated in Figure Kb): 



The sets Si and S5 are disjoint. Therefore, the wrong output is not explainable 
by only one broken gate. Therefore, we now assume that there are two broken 
gates and search for a hitting set of size k - 2 in the hypergraph with the vertices 
vi,...,V4,wi,...,W5 and hyperedges Si,. ..,85. The set {S^S^Sy is a sunflower 
of size £ + 1 = 3 with core {v3,V4\. Therefore, the functionality of gate V3 and V4 
must be checked. If, in contrast to our expectation, both gates V3 and V4 turn out to 
be working correctly, the usefulness of the sunflower becomes even more apparent: 
it is immediately clear not only that at least three gates are broken, but it is also 
clear which gates have to be checked for malfunctions next: W3, 104, and W5. □ 

In addition to fault diagnosis, sunflowers also yield a good tool for data reduction 
preserving optimal solutions, so that we can remove hyperedges and vertices from 
the input hypergraph, until it is small enough to be analyzed as a whole. This can 
be seen as follows. For every sunflower P with at least k + 1 petals, every hitting set 
of size at most k contains one of P's core-elements. Therefore, we can repeatedly 
discard a petal of a sunflower of size k + 2 from the hypergraph, yielding a decision- 
equivalent d-HiTTiNG Set instance whose largest sunflower has k + 1 petals [15]. 
This, by the sunflower lemma of Erdos and Rado [9], implies that the resulting 
hypergraph has 0(k d ) hyperedges [10, Theorem 9.8], therefore showing that this 
form of data reduction yields a problem kernel [6, 11]. 

Previous work. Downey and Fellows [8] showed that HITTING Set is W[2]- 
complete with respect to the parameter k when the cardinality of the hyperedges is 
unbounded. Hence, unless FPT = W[2], it has no problem kernel. Various problem 
kernels for d-HiTTiNG Set have been developed [2, 10, 15, 16, 18, 19]. However, 
the problem kernels aiming for efficiency faced some problems: Niedermeier and 
Rossmanith [18] showed a problem kernel for 3-HlTTlNG Set of size 0(k 3 ). They 
implicitly claimed that a polynomial-size problem kernel for d -HITTING Set is 
computable in linear time, not giving a proof for the running time. Nishimura 
et al. [19] claimed that a problem kernel with 0(k d ~ 1 ) vertices is computable in 
0(^(1 V| + \E\) + k d ) time, which, however, does not always yield correct problem 
kernels [2]. The problem kernels of Flum and Grohe [10] and Kratsch [15] exploit 
the sunflower lemma by Erdos and Rado [9] and therefore yield concise explana- 
tions of why certain vertices should be part of optimal solutions. However, their 
running times are only analyzed to be polynomial in the input size. Abu-Khzam [2] 
showed a problem kernel with 0(k d ~ 1 ) vertices for d-HlTTlNG Set, thus proving 
the previously claimed result of Nishimura et al. [19] on the number of vertices 
in the problem kernel. The problem kernel of Abu-Khzam [2] may still comprise 



Si = {v 1 ,v 2 ,w 1 }, 

S4 = {va,V4,W4}, 



S2 = {V2,V3,W 2 ), 
S5 = {V3,V4,W$}. 



S3 = {va,V4,W 3 }, 
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Q(k 2d ~ 2 ) hyperedges. 1 Dell and van Melkebeek [7] showed that the existence of 
a problem kernel with 0(k d ~ E ) hyperedges for any e would imply coNP Q NP/poly 
and a collapse of the polynomial-time hierarchy to the third level. Therefore, a 
problem kernel with 0(k d ~ e ) hyperedges is presumed not to exist. 

Our results. We show that a problem kernel for d-HlTTlNG Set with 0(k d ) hyper- 
edges and vertices is computable in linear time. Thereby, we prove the previously 
claimed result by Niedermeier and Rossmanith [18] and complement recent re- 
sults in improving the efficiency of kernelization algorithms [4, 12, 20]. In contrast 
to many other problem kernels [2, 16, 18, 19], our algorithm outputs sunflow- 
ers to guide fault diagnosis. Additionally, using ideas from Abu-Khzam [2] and 
Moser [16], we show that the number of vertices can be further reduced to 0(& d_1 ) 
with an additional amount of 0(k 15d ) time. Summarizing, by merging these tech- 
niques, we can compute in 0(|V| + \E\ + k 1 - 5d ) time a problem kernel comprising 
0(k d ) hyperedges and 0(& rf_1 ) vertices. 

Preliminaries. A hypergraph H = (V,E) consists of a set of vertices V and a set 
of (hyper)edges E, where each hyperedge in E is a subset of V. In a d-uniform 
hypergraph every edge has cardinality exactly d. A 2-uniform hypergraph is a 
graph. A hypergraph G = (V',E') is a subgraph of its supergraph H if V £ V and 
E' c E. A set S c V intersecting every set in E is a hitting set. A parameterized 
problem is a subset L c I* x N [8, 10, 17]. A problem kernel for a parameterized 
problem L is a polynomial-time algorithm that, given an instance (I,k), computes 
an instance d',k') such that |7'| +k' < f(k) and d',k') eL <^> (I,k) e L. Herein, 
the function f is called the size of the problem kernel and depends only on k. 

2 A Linear-Time Problem Kernel for d-Hitting Set 

This section shows a linear-time problem kernel for d-HlTTlNG Set comprising 
0(k d ) edges. That is, we show that a hypergraph H can be transformed in linear 
time to a hypergraph G such that G has 0(k d ) edges and allows for a hitting set 
of size k if and only H does. In Section 3, we show how to shrink the number of 
vertices to 0(& rf_1 ). 

Theorem 1. d-HlTTlNG Set admits a linear-time computable problem kernel 
comprising 0(k d ) hyperedges and vertices. 

Until now, problem kernels based on the sunflower lemma by Erdos and Rado [9] 
proceed in the following fashion [10, 15]: repeatedly (i) find a sunflower of size k + 1 
in the input graph and (ii) delete redundant petals until no more sunflowers of 
size k + 1 exist. This approach has the drawback of finding only one sunflower at a 
time and restarting the process from the beginning. 

1 Although not directly given in the work of Abu-Khzam [2] itself, this can be seen as follows: 
the kernel comprises vertices of each hyperedge in a set W of pairwise "weakly related" 
hyperedges and an independent set I. In the worst case, \W\ = h d ~ x and |/| = dk^ 1 
and each hyperedge in W has d subsets of size d—1. Then, each subset can constitute a 
hyperedge with each vertex in I and the kernel has Q(k 2d ~ 2 ) hyperedges. 
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Algorithm 1: Linear-Time Kernelization for d-HlTTlNG Set 
Input: Hypergraph H = (V,E), natural number k. 



Output: Hypergraph G - 

foreach e e E do 
foreach C e e do 

petals[C] - 0; 
foreach ueedo 
[ used[C][y] - 



(V,jE')with \E'\eO{k d ). 

I / Initialization for each edge 
/ / Initialization for all possible cores of sunflowers 
/ / No petals found for sunflower with core C yet 
/ / No vertex in a petal of a sunflower with core C yet 



false 



// Every vertex that hits e also hits e' 



7 foreach e e E do delete all e'De from E; 

8 foreach e e E do 

9 if VC e e : petals[C] < k then 
10 E'^E'u{e}; 

n foreach C S e do // Consider all possible cores for the petal e 

12 if Vu e e \ C : used[C][u] = false then 

13 petals[C] «- petals[C] + 1; 

14 foreach v e e \ C do used[C][u] «- true; 

is V':=lWe; 
16 return (V',£'); 



In contrast, to prove Theorem 1, we construct a subgraph G - (V ,E') of a 
given hypergraph H = (V,E) not by edge deletion; instead, we follow a bottom-up 
approach that allows us to "grow" many sunflowers in G simultaneously, stopping 
"growing sunflowers" if they become too large. Algorithm 1 repeatedly (after some 
initialization work in lines 1-7) in line 10 copies a hyperedge e from H to the 
initially empty G unless we find in line 9 that e contains the core C of a sunflower of 
size k + 1 in G. To check this, the number of petals found for a core C is maintained 
in a data structure petals[C]. If we find that an edge e is suitable as petal of a 
sunflower with core C in line 12, then we mark the vertices in e \ C as "used" for the 
core C in line 14. This information is maintained by setting "used[C][i>] <— true". In 
this way, vertices in e \ C are not considered again for finding petals for the core C 
in line 12, therefore ensuring that additionally found petals intersect e only in C. 

We now prove the correctness and running time of Algorithm 1, which will, 
together with the result that the output graph contains no large sunflowers, provide 
a proof of Theorem 1. Note that, by storing in petals[C] a list of found petals, they 
can serve as concise explanations of why a small hitting set contains vertices of C. 

Lemma 1. The hypergraph G returned by Algorithm 1 on input H admits a hitting 
set of size k if and only ifH does. 

Proof. We first show that, if H admits a hitting set of size k, then so does G. For 
every hitting set S for H = (V,E), the set S' := S nV is a hitting set for G = (V ' ,E') 
with |«S'| < |S|: the set S contains an element of every edge in E and, since E' QE 
and V = U e e£' e, the set S' contains an element of every edge in E'. It remains to 
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show that if G admits a hitting set of size k , then so does H . Assume that S is a 
hitting set of size k for G. Obviously, all edges that H and G have in common are 
hit in H by S. It remains to show that every edge e in H that is not in G is also hit. 

First, consider the case where e was not deleted in line 7. Then, adding e to G 
in line 10 of Algorithm 1 has been skipped, because the condition in line 9 is false. 
That is, petalsfC] > k + 1 for some Cee. Consequently, for this particular C, a 
sunflower P with k + 1 petals and core C exists in G, since we only increment 
petalsfC] in line 13 if we find e to be suitable as additional petal for a sunflower 
with core C in line 12. Note that C ^ 0, because otherwise k + 1 pairwise disjoint 
edges would exist in G, contradicting our assumption that S is a hitting set of 
size k for G. Since |S| < k, we have S n C # as discussed in the introduction. 
Therefore, since Cee and C^V, the edge e is hit by S also in H. 

Second, in the case where e was removed in line 7, e is also hit by S, because 
either G contains a sub-edge e' C e or e' is hit since its addition to G was skipped 
in application of the previous case. We conclude that S is a hitting set of size k 
also for H. □ 

Lemma 2. Given a hypergraph H — (V,E) and a natural number k, Algorithm 1 
can be implemented to run in 0(d\V\ + 2 d d-\E\)time. 

Proof. We first describe the data structure that is used to maintain petals[C] and 
used[C][i>], then its initialization in lines 1-6, then the implementation of lines 
8-15, and finally that of line 7. 

We assume that every vertex is represented as an integer in {1,...,|V|} and 
that every edge is represented as a sorted array. We can initially sort all edges of H 
in 0(\E\d\ogd) total time. Then, the set subtraction operation needed in line 12 
can be executed in 0(d) time such that the resulting set is again sorted. Moreover, 
we can generate all subsets of a sorted set such that the resulting subsets are 
sorted. Hence, we can assume to always deal with sorted edges and thus obtain 
a canonical representation of an edge as a length-d character string over the 
alphabet V. This enables us to maintain petalsfC] and used[C]M in a trie: a trie is 
a tree-like data structure, in which, when each of its inner nodes is implemented as 
an array, a value associated with a character string X can be looked up and stored 
in 0(|X|) time [3, Section 5.3]. Hence, we can look up and store values associated 
with a set C in 0(d) time. We use such a trie to associate with some sets CeV, 
|C| < d, an integer petalsfC], and a pointer to a vector used[C][] of length |V|. 

For initial creation of the trie in lines 1-6, we do not initialize every cell of the 
array that implements an inner node of the trie, as this would take 0(|V|) time 
for each non-empty node. However, we have to initialize all cells that will be 
accessed: otherwise, it will be unknown if a cell contains a pointer to a another 
node or random data. We achieve this as follows: in lines 1-6, we obtain a length- 
2 d \E\ list L of all possible sets C s e for all e eE. We will only associate values 
with sets in L, and therefore initialize the inner nodes of the trie to only hold 
values associated with sets in L. This works in 0(d|V| +2 d d ■ \E\) time, since the 
representation of sets in L as length-d strings over the alphabet V enables us to 
sort L in 0(d(|V| + \L\)) = 0(d\V\ + 2 d d ■ \E\) time using Radix Sort [3, Section 8.3]. 
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We build the trie by iterating over L once: in each iteration, we check in 0(d) time 
in which positions the character string for a set C differs from the character string 
of its predecessor set in L. This tells us which array entries of the inner nodes of the 
trie have to be newly initialized, and which nodes in the trie on the path to the leaf 
corresponding to C have been previously initialized and may not be overwritten. 
Hence, we can implement lines 1-6 to run in 0(d\V\ + 2 d d\E\) time, observing that 
line 5 can be implemented to run in 0(d)-time, as only one look-up to used[C][] 
is needed to obtain an array, in which then 0(d) necessary values are initialized. 

The for-loop in line 8 iterates \E\ times. Its body works in 0(2 d d) time: obvi- 
ously, this time bound holds for lines 9 and 10; it remains to show that the body 
of the for-loop in line 11 works in 0(d) time. This is easy to see if one considers 
that, in lines 12 and 14, one only has to do one look-up to used[C][] to find an array 
that holds the values for the at most d vertices v' £ e. Also line 15 works in linear 
time by first initializing all entries of an array verticesf] of size |V| to "false". Then, 
for each edge e e E' and each vertex v e e, set "verticesM «- true" in 0(d) time. 
Afterward, let V be the set of vertices v for which verticesfu] = true. This takes 
0(|V| + d|£|)time. 

It remains to discuss the running time of line 7. Similarly as in lines 8-14, we 
iterate over all edges e e E, and for all proper subsets c'ce add a pointer to the 
position of e in E to the list supersets[e'] (associated with e' using a trie). It then 
remains to remove the edges in supersets[e'] from E for each edge ee£. □ 

We now show that there is an upper bound on the size of the sunflowers in the 
graph output by Algorithm 1. This enables us to upper-bound the size of the output 
graph similarly to how the sunflower lemma of Erdos and Rado [9] is used in the 
d -Hitting Set kernel of Flum and Grohe [10, Theorem 9.8]. 

Lemma 3. Given a hypergraph H - (V,E) and a natural number k, Algorithm 1 
outputs a hypergraph G whose largest sunflower has d(k + 1) petals. 

Proof Let P be a sunflower with core C in G. If C e P, then |P| = 1 because of 
line 7 of Algorithm 1. If C t P, the following two observations yield \P\ < d(k + 1): 

(i) Every petal e £ P present in G is copied from H in line 10 of Algorithm 1. 
Consequently, every petal eeP contains a vertex v satisfying used[C][u] = true: if 
this condition would be violated in line 12, then line 14 applies "used[C][i>] «- true" 
to all vertices vee\C. 

(ii) Whenever petals[C] is incremented by one in line 13, then, in line 14, 
"used[C]M «- true" is applied to the at most d vertices v ee. Thus, since always 
petals[C] < k + 1, at most d(k + 1) vertices v satisfy used[C][u] = true. Moreover, 
since, by line 14, no v e C satisfies used[C][y] = true and the petals in P pairwise 
intersect only in C, it follows that at most d{k + 1) petals in P contain vertices 
satisfying used[C]M = true. □ 

The last ingredient in the proof of Theorem 1 is the sunflower lemma by Erdos and 
Rado [9]. In a similar way as Flum and Grohe [10, Lemma 9.7], we can show the 
following refined version, which we need for Section 3. Note that, for 6=1, this is 
exactly the sunflower lemma [10]. 
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Lemma 4. Let H = (V,E) be an i-uniform hypergraph, b,c e N, and b < £ such 
that every pair of edges in H intersects in at most £ - b vertices. IfH contains more 
than £\c e+1 ~ b edges, then H contains a sunflower with more than c petals. 

We finally have all ingredients to show that cZ-Hitting Set admits a linear-time 
computable 0(& d )-size problem kernel, thus proving Theorem 1. 

Proof (Theorem 1). Lemma 1 and Lemma 2 show that Algorithm 1 executes linear- 
time data reduction such that the input and output graph are equivalent with 
respect to cZ-Hitting Set. It remains to show that the graph G output by Algo- 
rithm 1 comprises at most d! • d d+1 ■ (k + l) d E 0(k d ) edges. This then also implies 
that G has 0(k d ) vertices, as the vertex set of G is constructed as the union of its 
edges in line 15 of Algorithm 1. 

To bound the number of edges, consider for 1 < £ < d the /-uniform hyper- 
graph G( = (Ve,E() comprising only the edges of size £ of G. If G had more than 
d\-d d+l -{k + l) d edges, then, for some £ <d,Ge would have more than d\-d d -{k + l) d 
edges. This, however, leads to a contradiction with Lemma 3: Lemma 4 with 6 = 1 
and c = d{k + 1) states that if G/ had more than £\ ■ d e ■ (k + l) e edges, then G( 
would contain a sunflower with more than d(k + 1) petals. This sunflower would 
also exist in the supergraph G of G(. □ 

3 Reducing the Number of Vertices to 0(k d l ) 

In this section, we combine our problem kernel shown in Section 2 with techniques 
from Abu-Khzam [2] and Moser [16, Section 7.3]. Thus, we obtain a problem kernel 
for d-HlTTlNG Set comprising 0(k d ) edges and 0(k d ~ 1 ) vertices in 0(|V| + \E\ + 
k lsd ) time. To this end, we first briefly sketch the running- time bottleneck of the 
kernelization idea of Abu-Khzam [2], which is also a bottleneck in the algorithm of 
Moser [16]. 

The problem kernels of Abu-Khzam [2] and Moser [16, Section 7.3]. Given a 
hypergraph H = (V,E) and a natural number k, Abu-Khzam [2] first computes a 
maximal weakly related set W, where data reduction ensures |W| < k d ~ x : 

Definition 2 ([2]). A set W S E is weakly related if every pair of edges in W 
intersects in at most d — 2 vertices. 

Whether a given edge e can be added to a set W of weakly related edges can 
be checked in 0(d|W|) time. After adding e, data reduction on W is executed in 
0(2 d |W|log|W|) time. Hence, since always |W| < k d ~ x , Abu-Khzam [2] can com- 
pute W in 0(2 rf /fe rf - 1 logfe ■ \E\) time. 

Since \W\ <k d ~ x , it remains to bound the size of the set / of vertices not 
contained in edges of W. The set / is an independent set, that is, I contains no 
pair of vertices occurring in the same edge [2]. A bipartite graph B-(I \±)S,E') is 
constructed, where S :- {e S V \ 3v e I : 3w e W: e S w, {v} U e eE} and E' := {{v,e} \ 
v e I,e e S, {v}l)e e E}. Whereas Abu-Khzam [2] shrinks the size of / using so-called 
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Algorithm 2: Linear-time computation of a maximal weakly related set 
Input: Hypergraph H = (V,E), natural number k. 
Output: Maximal weakly related set W. 

1 

2 foreach e e E do 
s foreach C S e, |C| = c? - 1 do 

4 intersection[C] «- false; 

5 intersection^ \ C] «- false; 



/ / Initialization for each edge 



//No edges in W contain C yet. 
/ / The vertex in e \ C is not in W yet. We use 
/ / this later to compute an independent set. 



6 foreach e e E do 

7 if VC E e,\C\ = d — \: intersection[C] = false then 

8 ff-ffu(c); 

9 foreach C £ e, |C| = d - 1 do 

10 intersection[C] «- true; 

11 intersection^ \ C] «— true; 



12 return W: 



crown reductions, Moser [16, Lemma 7.16] shows that it is sufficient to compute a 
maximum matching in B and to remove unmatched vertices in / from G together 
with the edges containing them. The bound of the number of vertices in the problem 
kernel is thus Oik^ 1 ), since \W\ < k d ~ l , and therefore |/| < \S\ < dk^ 1 . 

Our improvements. Given a hypergraph H = (V,E) and a natural number k, we 
can first compute our problem kernel in 0(|V| + \E\) time, leaving 0(k d ) edges 
in H. Afterward applying the problem kernel of Abu-Khzam [2] would reduce 
the number of vertices to 0(& d_1 ). However, the computation of the maximal 
weakly related set on our reduced instance already takes 0(2 rf & d-1 k>g& • \E\) = 
0(& 2d_1 log&) additional time, as discussed above. We improve the running time of 
this step in order to show: 

Theorem 2. d-HlTTlNG Set admits a problem kernel comprising 0{k d ) hyper- 
edges and 0(k d ~ 1 ) vertices computable in 0(|V| + \E\ + k 15d ) time. 

To prove Theorem 2, we compute a maximal weakly related set W in linear time. 
Then, we show that our problem kernel already ensures \W\ e Oik 1 *- 1 ) and that 
further data reduction on W is therefore unnecessary. This makes finding a max- 
imum matching the new bottleneck of the kernelization described by Moser [16, 
Section 7.3]. 

Lemma 5. Given a hypergraph H = (V,E), a maximal weakly related set is com- 
putable in 0(d\V\ +d 2 -\E\) time. 

To prove Lemma 5, we employ Algorithm 2. 

Proof. First, observe that the set W returned in line 12 of Algorithm 2 is indeed 
weakly related: let w \ ^ W2 e E intersect in more than d - 2 vertices and assume 
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that w\ is added to W in line 8. Let C :- wiDW2- Obviously, |C| = d-1. Hence, when 
w\ is added to W, then we apply "intersection[C] «- true" in line 10. Therefore, 
when e - W2 is considered in line 6, the condition in line 7 does not hold, which 
implies that w<i is not added to W in line 8. In the same way it follows that every 
edge is added to W that does not intersect any edge of W in more than d - 2 vertices. 
Therefore, W is maximal. 

We first sort all edges of H in 0(\E\dlogd) time. Using the trie data structure 
and initialization method as used in Lemma 2, we can do each look-up of a value 
intersection[C] in 0(d) time if C is the result of a set subtraction operation of two 
sorted sets. We initialize the trie to associate values with at most 2d ■ \E\ sets. 
Hence, as discussed in Lemma 2, the initialization in lines 1-5 can be done in 
0(d\V | + d 2 ■ \E\) time. Finally, for every edge, the body of the for-loop in line 6 can 
be executed in 0(d 2 ) time doing 0(d)-time look-ups for each of the 2d ■ \E\ sets. □ 

We can now prove Theorem 2 by showing how to compute a problem kernel with 
Oik^ 1 ) vertices in 0(|V| + \E\ + k 15d ) time. 

Proof (of Theorem 2). We may assume that the hypergraph H = (V,E) in an 
instance of d -Hitting Set satisfies |V| + \E\ e 0(k d ) and contains sunflowers with 
at most d(k + 1) petals, since otherwise using Algorithm 1, we can transform H 
accordingly in linear time, as stated by Lemma 3 and Theorem 1. To reduce 
the number of vertices in H to 0(£ rf_1 ), we follow the approach of Moser [16, 
Lemma 7.16] as discussed in the beginning of this section. 

First, compute a maximal weakly related set W in H in 0(| V| + |2£|) — 0(k d ) time 
using Algorithm 2. We show that |W] e 0(k d ~ 1 ). Because every pair of edges in W 
intersects in at most d-2 vertices, by Lemma 3 and Lemma 4 for b - 2 and 
c = d(k + 1), we know that the hypergraph (V, We) for £ > 2, where We is the set of 
cardinality-^ edges in W, has at most 0{k d ~ x ) edges. Moreover, Wi contains at most 
0{k) edges, as they form a sunflower with empty core. Therefore, |W| e O^" 1 ). 
Next, we construct a bipartite graph B = (7 i±J S,E'), where 

(i) I is the set of vertices in V not contained in any edge in W, forming an 
independent set [2, 16], 

(ii) S := {e S V | 3v el: 3w e W: e Qw,{v}ue eE], and 

(iii) E' := {{v,e} \ v e I,e e S, {v} u e e E}. 

This can be done in 0(\E\) = 0(k d ) time as follows: for each e eE with |e| = d 
and each v e e, add {v,e \ {v}} to E' if and only if intersection^ \ {v}] = true and 
intersection^}] = false. In this case, it follows that e is separable into 

(i) a subset e \ {v} of an edge of W, since intersection^ \ {v}] = true, and 

(ii) the vertex v that is not contained in any edge in W and, hence, contained in I, 
since intersection^}] = false. 

Thus, e clearly satisfies the definition of E'. Finally, for each edge {v,C} added 
to E' , add v to I and C to S. Herein, checking that an element is not added to I 
or S multiple times can be done in 0(d) time per element: to this end, we use a trie 
data structure similarly as "petalsf]" in Lemma 2 or "intersectionf]" in Algorithm 2. 
Similarly to Lemma 2, the trie can be initialized in linear time, since we know the 
elements to be added to / and S in advance. 
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It remains to shrink / to 0(& rf_1 ) vertices by computing a maximum matching 
in B and deleting from H the unmatched vertices in I and the edges containing 
them. However, note that by construction of B, for each edge in H, we add at most 
one edge and two vertices to B. Therefore, B has 0(k d ) edges and vertices. Hence, 
a maximum matching on B can be computed in 0(y/\IvH\-\E'\) = 0(k 15d ) time 
using the algorithm of Hopcroft and Karp [22, Theorem 16.4]. □ 

4 Conclusion 

We have improved the running times of the 0(& rf_1 )-vertex problem kernels for 
d-HiTTiNG Set by Abu-Khzam [2] and Moser [16]. To this end, we showed, as 
claimed earlier by Niedermeier and Rossmanith [18], that a polynomial-size prob- 
lem kernel for d -Hitting Set can be computed in linear time — more specifically, 
a problem kernel comprising 0(k d ) hyperedges and vertices. In contrast to these 
problem kernels, our algorithm maintains expressiveness by finding, in forms 
of sunflowers, concise explanations of potential problem solutions. However, the 
constant hidden in our 0(£ rf_1 )-bound on the number of vertices is d\d d+2 and 
therefore higher than the constant 2d -1 obtained by Abu-Khzam [2]. This is 
due to the fact that our upper bound on the size of the weakly related set W 
comes from the sunflower lemma in Lemma 3, whereas Abu-Khzam [2] executes 
more effective data reduction on W. Regarding these constants, first experiments 
with an implementation of our algorithm show that the data reduction is indeed 
effective. It is interesting whether a problem kernel with 0(& rf_1 ) vertices and 
0(k d ) edges for d -HITTING Set can be computed in linear time. This would merge 
the best known results for problem kernels for d-HlTTlNG Set. However, all 
known 0(£ rf_1 )-vertex problem kernels for d-HlTTlNG Set, that is, the problem 
kernels by Abu-Khzam [2] and Moser [16, Section 7.3], involve the computation of 
maximum matchings. This seems to be a difficult to avoid bottleneck. 

Acknowledgment. The author is very thankful to Rolf Niedermeier for many 
valuable comments. 
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